[Feature] : Add REST API endpoints for access through AI #1117
Conversation
ab72a13 to
1441393
Compare
1f3c1d5 to
094f3af
Compare
|
|
While @canihavesomecoffee has a bit of time to review, some notes:
I asked claude to review it. Take a look: CRITICAL C1 — Missing-output detection is broken; failing runs report pass. (verified) Consequence in derive_sample_status (status.py:57-69): empty result_files → both loops skip → if exit code matches → returns pass. Cascades into run status, /summary counts, /errors, /error-summary. The whole point of this platform is catching regressions; this silently passes tests that produced no output. The "DEPLOYMENT PREREQUISITE" SQL in the docstring is also misleading — it checks for old rows that were likewise never persisted. Must rewrite missing-output derivation against expected RegressionTestOutputs. HIGH H1 — Privilege escalation: any user can revoke anyone's tokens. (verified) routes/auth.py:64 puts tokens:manage in allowed_scopes for every role (only baselines:write is admin-gated). revoke_specific_token (auth.py:175) has no @require_roles; its only cross-user guard is has_scope('tokens:manage') — which a plain user now holds. → low-priv user can revoke admins' tokens (lockout/DoS). Fix: gate tokens:manage to admins, or restrict non-admins to revoking their own tokens regardless of scope. H2 — POST /runs ownership bypass → arbitrary-repo CI builds. routes/runs.py:116 _validate_run_permissions: is_staff (admin/tester/contributor) short-circuits the ownership check, and repository is only regex-validated, never confirmed to exist/be-owned. A contributor can enqueue a run against any owner/repo string, creating Fork rows and making build VMs clone/build attacker-chosen repos. SSRF/resource-abuse surface. H3 — 'testuser' short-circuit poisons real data. mod_auth/controllers.py: fetch_username_from_token returns 'testuser' whenever config['TESTING'] is truthy, and github_callback now persists github_login = 'testuser' to the real user row. TESTING is env-var controlled (run.py:109). If ever set in a prod-ish env, it corrupts the value H2's owner-check trusts. Don't couple security-relevant data to TESTING, and never write the stub to the DB. H4 — API 500s return HTML, not JSON. middleware/error_handler.py:101 @mod_api.errorhandler(500) never fires — Flask doesn't dispatch the 500 code to blueprint handlers; the app-level handler renders 500.html. after_app_request only JSON-ifies 404/405. Any unhandled exception breaks the "always JSON" contract. The unit test calls the handler directly so it passes. Fix at app level for /api/v1 paths. (The MarshmallowValidationError/SQLAlchemyError class handlers do work — only the 500 one is dead.) H5 — verify_schemathesis.py (938 lines) is dead. CI uses nose2 which collects test*.py — this file (verify_*) is never collected, and it imports schemathesis/hypothesis which aren't in requirements (ImportError if run). The headline "contract test" provides zero coverage. Wire it in or drop it. H6 — N+1 in get_sample_history. routes/samples.py:318 calls get_run_timestamps → batch_get_run_data([test]) per row = 3 queries × N runs, re-querying already-loaded data. Defeats the PR's own batching claim. Batch once over the distinct tests. H7 — POST /runs/{id}/cancel check-then-act race. routes/runs.py:481 reads status, then inserts a canceled TestProgress with no lock/re-check → duplicate cancels, or canceled written after completed (flips a finished run). "Idempotent" only holds on the already-terminal fast path. Use with_for_update() and re-check. MEDIUM (condensed)
LOW/NIT No Cache-Control: no-store on the one-time token response; timing oracle on the no-candidate auth path (no dummy verify, unlike the login path); no unique constraint on token_prefix; extension sanitization brittle (_safe_resolve saves it); progress.step hardcoded None; HSTS sent unconditionally. Bottom line: C1 alone blocks merge — the API would misreport CI results, defeating the platform's purpose. H1–H4 are real security/contract bugs. Architecture and tests are solid; this is a fixable PR, not a rewrite. Recommend: request changes, lead with C1, require the dep bumps be split out. Want me to (a) post this as a PR review on GitHub, (b) save it to plans/PR_1117_REVIEW.md, or (c) deep-dive/repro any specific finding (e.g. write a failing test that proves C1)? |



Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].
In raising this pull request, I confirm the following (please check boxes):
My familiarity with the project is as follows (check one):
Feature: JSON REST API for the CCExtractor CI Platform (
mod_api)Summary
This PR introduces a complete, spec-driven JSON REST API for the CCExtractor Sample Platform, mounted at
/api/v1. It replaces the need for scrapers or direct database access by exposing every CI data point (runs, samples, results, diffs, logs, errors, artifacts, queue status) through authenticated, validated, and rate-limited endpoints.The API is fully described by the
openapi-ci-api.yamlspecification and validated against it using Schemathesis property-based testing. It integrates cleanly with the existing data model (mod_test,mod_regression,mod_sample,mod_auth) without modifying any of those modules.Motivation
The existing platform serves server-rendered HTML pages and lacks a programmatic interface. This makes it impossible for:
This API solves all of the above by providing a standard REST interface with proper authentication, authorization, pagination, filtering, and error handling.
Architecture
The module is organized into five layers, each with a single responsibility:
Why this structure matters
before_request/after_requesthooks, so route handlers never contain auth, validation, or rate-limit boilerplateload) and output serialization (dump), ensuring the API contract matches the OpenAPI specservices/status.pyto prevent inconsistent status logic across endpointsEndpoints
Authentication (
/auth/tokens)POST/auth/tokensGET/auth/tokenstokens:manage?all=true)DELETE/auth/tokens/currentDELETE/auth/tokens/{id}Runs (
/runs)GET/runsruns:readplatform,branch,commit_sha,repository,status,created_after/before,sortPOST/runsruns:writeGET/runs/{id}runs:readGET/runs/{id}/summaryruns:readGET/runs/{id}/progressruns:readGET/runs/{id}/configruns:readPOST/runs/{id}/cancelruns:writeSamples (
/samples,/runs/{id}/samples,/regression-tests)GET/runs/{id}/samplesruns:readGET/runs/{id}/samples/{sid}runs:readGET/samplesruns:readname,extension,tag,sha256,statusGET/samples/{id}runs:readGET/samples/{id}/historyruns:readGET/regression-testsruns:readactive,category,tag,sample_idfiltersResults (
/runs/{id}/samples/{sid}/regression-tests/{rid}/outputs/{oid})GET.../expectedresults:readGET.../actualresults:readGET.../diffresults:readPOST/runs/{id}/samples/{sid}/baseline-approvalbaselines:write+ admin/contributorErrors & Logs (
/runs/{id}/errors,/runs/{id}/logs)GET/runs/{id}/errorsresults:readGET/runs/{id}/infrastructure-errorssystem:readGET/runs/{id}/error-summaryresults:readGET/runs/{id}/logssystem:readlevel,source,containsfiltersGET/runs/{id}/samples/{sid}/logssystem:readSystem (
/system)GET/system/healthGET/system/queuesystem:readGET/runs/{id}/artifactsresults:readSecurity
Token Authentication
spci_for easy identification in logs and secret scannersScope-Based Authorization
Six scopes control access granularity:
runs:readruns:writeresults:readbaselines:writesystem:readtokens:manageNon-admin users cannot request
baselines:write. The scope check returns 403 with details about which scopes are missing and which scopes the token holds, making debugging straightforward.Role Enforcement
Certain operations additionally require specific user roles:
POST /runs/{id}/cancel— admin, contributor, or testerPOST /runs/{id}/samples/{sid}/baseline-approval— admin or contributorPOST /runs(Triggering runs):runs:writescope, regardless of the repository.admin,contributor, ortester. To trigger runs for your own fork repository, any authenticated user (who owns the fork and hasruns:writescope) can do so.Rate Limiting
Per-client sliding window rate limits protect against abuse:
POST /auth/tokensRate limit headers (
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset,Retry-After) are included on every response. A background eviction process prunes stale entries every 100 requests to bound memory usage.Rate limiting is automatically disabled when
app.config['TESTING']isTrueto prevent interference with test suites. In production, theTESTINGflag is never set.Security Headers
Every API response includes:
Strict-Transport-Security: max-age=31536000; includeSubDomainsContent-Security-Policy: default-src 'none'; frame-ancestors 'none'X-Content-Type-Options: nosniffX-Frame-Options: DENYPath Traversal Protection
All file-serving endpoints (
/expected,/actual,/diff) use_safe_resolve()which callsos.path.realpath()and verifies the resolved path is within theTestResultsbase directory. File extensions from the database are sanitized to remove/,\, and..sequences before being appended to filenames.Input Validation
All validation happens at the middleware layer through reusable decorators:
@validate_body(Schema)— Validates JSON body against a Marshmallow schema. Returns 415 for non-JSON content types, 400 for malformed JSON, and 400 with field-level error details for schema violations@validate_offset_pagination()— Extracts and validateslimit(1–100) andoffset(≥0, ≤2147483647). Rejects requests that mix cursor and offset pagination@validate_cursor_pagination()— Extracts and validateslimitandcursorfor log endpoints@validate_path_id('param')— Ensures URL path parameters are positive integers within int32 range@validate_date_range— Parsescreated_after/created_beforeISO 8601 params and rejects inverted ranges@validate_sort()— Validates thesortquery parameter against a whitelist (created_at,-created_at,run_id,-run_id)GET /samplesname filter escapes%and_to prevent SQL LIKE injectionStatus Derivation
Status logic is centralized in
services/status.pyto prevent inconsistencies:Run Status
The existing
test.failedproperty only checks forTestStatus.canceledand is not reliable for determining whether regression tests passed. Instead, we derive status from the latestTestProgressrow and, for completed runs, count actual failures fromTestResultrows:queuedrunningpreparationortestingcanceledcanceledpasscompleted+ zero failuresfailcompleted+ one or more failuresincompleteSample Status
passfailmissing_outputnot_startedDummy Row Detection
The CI worker writes a sentinel
TestResultFilerow withregression_test_output_id = -1andgot = 'error'when a test produces no output. This row is filtered from all API responses and drives themissing_outputstatus.Pagination
Two pagination strategies are used:
Offset Pagination (most endpoints)
{ "data": [...], "pagination": { "limit": 50, "offset": 0, "total": 142, "next_offset": 50 } }Cursor Pagination (build logs)
{ "data": [...], "pagination": { "limit": 100, "next_cursor": 3847 } }Cursor pagination is used for logs because they are read from files, not from the database, and offset-based access would require reading and discarding lines on every request.
Error Response Format
All errors follow a consistent structure:
{ "code": "validation_error", "message": "Request failed schema validation.", "details": { "fields": { "email": ["Missing data for required field."], "password": ["Shorter than minimum length 1."] } } }HTTP exceptions (404, 405, 500) are caught and reformatted into this structure so that API clients never receive HTML error pages.
Performance Considerations
Batch Status Derivation
batch_get_run_data()inservices/status.pypreloads allTestProgress,TestResult, andTestResultFilerows for a batch of runs in three queries (instead of N+1), then computes statuses in-memory. This is critical for theGET /runslist endpoint.Status Filtering Caveat
When
GET /runs?status=runningis requested, the API must load up to 1,000 runs and compute their derived status in-memory (because status is not a database column). Atruncated: trueflag is included in the pagination response when the 1,000-run cap is hit. This is documented in the response schema.Result File Preloading
Endpoints that iterate over
TestResultrows (run summary, run samples, sample history) usedefaultdict-based preloading ofTestResultFilerows to avoid N+1 queries.Diff Size Limits
The diff endpoint rejects files larger than 10 MiB and directs clients to the download URL instead. Output files are truncated at 1 MiB with a
truncated: trueflag and a GCS download link for the full file.Database Changes
New Table:
api_tokenidINT PRIMARY KEYuser_idINT FK → user.idtoken_nameVARCHAR(50)uq_user_token_name)token_hashVARCHAR(255)token_prefixVARCHAR(16)scopes_jsonTEXTcreated_atDATETIME(tz)expires_atDATETIME(tz)revoked_atDATETIME(tz)No existing tables are modified.
Test Coverage
Test Organization (265 tests total)
Schemathesis (Property-Based Testing)
The
test_schemathesis.pymodule validates every endpoint against the OpenAPI specification using Schemathesis, which generates hundreds of edge-case inputs per endpoint. It covers:Hypothesis is configured with
max_examples=5anddeadline=Noneto keep CI runtime reasonable (~7 minutes locally) while still providing meaningful coverage. Rate limiting is bypassed during testing viaapp.config['TESTING'].What the tests mock
config_parser.parse_config→ returns an in-memory SQLite configgoogle.cloud.storage.Client.from_service_account_json→ returns a mock GCS clientNew Files
mod_api/__init__.pymod_api/utils.pymod_api/models/api_token.pymod_api/middleware/auth.pymod_api/middleware/error_handler.pymod_api/middleware/rate_limit.pymod_api/middleware/security.pymod_api/middleware/validation.pymod_api/schemas/*.pymod_api/services/status.pymod_api/services/diff_service.pymod_api/services/error_service.pymod_api/services/log_service.pymod_api/services/storage.pymod_api/routes/*.pytests/api/*.pyopenapi-ci-api.yamlTotal new code: ~5,500 lines (source) + ~3,000 lines (tests) + ~2,700 lines (OpenAPI spec)
Dependencies
New pip dependencies:
argon2-cffi— Token hashing (argon2id)marshmallow— Schema validation and serializationschemathesis— OpenAPI contract testing (test dependency only)hypothesis— Property-based testing engine (test dependency only, pulled in by schemathesis)All other dependencies (
flask,sqlalchemy,passlib,google-cloud-storage) were already inrequirements.txt.How to Test Locally
Known Limitations & Future Work
GET /runs?status=Xrequires in-memory filtering with a 1,000-run cap. A future improvement would be to materialize run status as a database column updated by a trigger or the CI worker.GET /runs/{id}/samples/{sid}/logsreturns 404 because the CI worker doesn't produce per-sample log files. The endpoint is implemented and documented as a placeholder for when the worker adds this capability.blob.exists()before generating signed URLs (to avoid latency). Clients should handle 404s from the signed URL gracefully.